SPOOF: Sum-Product Optimization and Operator Fusion for Large-Scale Machine Learning

نویسندگان

  • Tarek Elgamal
  • Shangyu Luo
  • Matthias Boehm
  • Alexandre V. Evfimievski
  • Shirish Tatikonda
  • Berthold Reinwald
  • Prithviraj Sen
چکیده

Systems for declarative large-scale machine learning (ML) algorithms aim at high-level algorithm specification and automatic optimization of runtime execution plans. State-ofthe-art compilers rely on algebraic rewrites and operator selection, including fused operators to avoid materialized intermediates, reduce memory bandwidth requirements, and exploit sparsity across chains of operations. However, the unlimited number of relevant patterns for rewrites and operators poses challenges in terms of development effort and high performance impact. Query compilation has been studied extensively in the database literature, but ML programs additionally require handling linear algebra and exploiting algebraic properties, DAG structures, and sparsity. In this paper, we introduce Spoof, an architecture to automatically (1) identify algebraic simplification rewrites, and (2) generate fused operators in a holistic framework. We describe a snapshot of the overall system, including key techniques of sum-product optimization and code generation. Preliminary experiments show performance close to hand-coded fused operators, significant improvements over a baseline without fused operators, and moderate compilation overhead.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Optimizing Operator Fusion Plans for Large-Scale Machine Learning in SystemML

Many large-scale machine learning (ML) systems allow specifying custom ML algorithms by means of linear algebra programs, and then automatically generate efficient execution plans. In this context, optimization opportunities for fused operators—in terms of fused chains of basic operators—are ubiquitous. These opportunities include (1) fewer materialized intermediates, (2) fewer scans of input d...

متن کامل

Two-stage fuzzy-stochastic programming for parallel machine scheduling problem with machine deterioration and operator learning effect

This paper deals with the determination of machine numbers and production schedules in manufacturing environments. In this line, a two-stage fuzzy stochastic programming model is discussed with fuzzy processing times where both deterioration and learning effects are evaluated simultaneously. The first stage focuses on the type and number of machines in order to minimize the total costs associat...

متن کامل

A Mathematical Programming Model and Genetic Algorithm for a Multi-Product Single Machine Scheduling Problem with Rework Processes

In this paper, a multi-product single machine scheduling problem with the possibility of producing defected jobs, is considered. We concern rework in the scheduling environment and propose a mixed-integer programming (MIP) model for the problem.  Based on the philosophy of just-in-time production, minimization of the sum of earliness and tardiness costs is taken into account as the objective fu...

متن کامل

Extended and infinite ordered weighted averaging and sum operators with numerical examples

This study discusses some variants of Ordered WeightedAveraging (OWA) operators and related information aggregation methods. Indetail, we define the Extended Ordered Weighted Sum (EOWS) operator and theExtended Ordered Weighted Averaging (EOWA) operator, which are applied inscientometrics evaluation where the preference is over finitely manyrepresentative works. As...

متن کامل

Comparative Analysis of Machine Learning Algorithms with Optimization Purposes

The field of optimization and machine learning are increasingly interplayed and optimization in different problems leads to the use of machine learning approaches‎. ‎Machine learning algorithms work in reasonable computational time for specific classes of problems and have important role in extracting knowledge from large amount of data‎. ‎In this paper‎, ‎a methodology has been employed to opt...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017